60 research outputs found

    On the complexity of range searching among curves

    Full text link
    Modern tracking technology has made the collection of large numbers of densely sampled trajectories of moving objects widely available. We consider a fundamental problem encountered when analysing such data: Given nn polygonal curves SS in Rd\mathbb{R}^d, preprocess SS into a data structure that answers queries with a query curve qq and radius ρ\rho for the curves of SS that have \Frechet distance at most ρ\rho to qq. We initiate a comprehensive analysis of the space/query-time trade-off for this data structuring problem. Our lower bounds imply that any data structure in the pointer model model that achieves Q(n)+O(k)Q(n) + O(k) query time, where kk is the output size, has to use roughly Ω((n/Q(n))2)\Omega\left((n/Q(n))^2\right) space in the worst case, even if queries are mere points (for the discrete \Frechet distance) or line segments (for the continuous \Frechet distance). More importantly, we show that more complex queries and input curves lead to additional logarithmic factors in the lower bound. Roughly speaking, the number of logarithmic factors added is linear in the number of edges added to the query and input curve complexity. This means that the space/query time trade-off worsens by an exponential factor of input and query complexity. This behaviour addresses an open question in the range searching literature: whether it is possible to avoid the additional logarithmic factors in the space and query time of a multilevel partition tree. We answer this question negatively. On the positive side, we show we can build data structures for the \Frechet distance by using semialgebraic range searching. Our solution for the discrete \Frechet distance is in line with the lower bound, as the number of levels in the data structure is O(t)O(t), where tt denotes the maximal number of vertices of a curve. For the continuous \Frechet distance, the number of levels increases to O(t2)O(t^2)

    Probabilistic embeddings of the Fr\'echet distance

    Full text link
    The Fr\'echet distance is a popular distance measure for curves which naturally lends itself to fundamental computational tasks, such as clustering, nearest-neighbor searching, and spherical range searching in the corresponding metric space. However, its inherent complexity poses considerable computational challenges in practice. To address this problem we study distortion of the probabilistic embedding that results from projecting the curves to a randomly chosen line. Such an embedding could be used in combination with, e.g. locality-sensitive hashing. We show that in the worst case and under reasonable assumptions, the discrete Fr\'echet distance between two polygonal curves of complexity tt in Rd\mathbb{R}^d, where d{2,3,4,5}d\in\lbrace 2,3,4,5\rbrace, degrades by a factor linear in tt with constant probability. We show upper and lower bounds on the distortion. We also evaluate our findings empirically on a benchmark data set. The preliminary experimental results stand in stark contrast with our lower bounds. They indicate that highly distorted projections happen very rarely in practice, and only for strongly conditioned input curves. Keywords: Fr\'echet distance, metric embeddings, random projectionsComment: 27 pages, 11 figure

    Locality-Sensitive Hashing of Curves

    Get PDF
    We study data structures for storing a set of polygonal curves in Rd{\rm R}^d such that, given a query curve, we can efficiently retrieve similar curves from the set, where similarity is measured using the discrete Fr\'echet distance or the dynamic time warping distance. To this end we devise the first locality-sensitive hashing schemes for these distance measures. A major challenge is posed by the fact that these distance measures internally optimize the alignment between the curves. We give solutions for different types of alignments including constrained and unconstrained versions. For unconstrained alignments, we improve over a result by Indyk from 2002 for short curves. Let nn be the number of input curves and let mm be the maximum complexity of a curve in the input. In the particular case where mα4dlognm \leq \frac{\alpha}{4d} \log n, for some fixed α>0\alpha>0, our solutions imply an approximate near-neighbor data structure for the discrete Fr\'echet distance that uses space in O(n1+αlogn)O(n^{1+\alpha}\log n) and achieves query time in O(nαlog2n)O(n^{\alpha}\log^2 n) and constant approximation factor. Furthermore, our solutions provide a trade-off between approximation quality and computational performance: for any parameter k[m]k \in [m], we can give a data structure that uses space in O(22kmk1nlogn+nm)O(2^{2k}m^{k-1} n \log n + nm), answers queries in O(22kmklogn)O( 2^{2k} m^{k}\log n) time and achieves approximation factor in O(m/k)O(m/k).Comment: Proc. of 33rd International Symposium on Computational Geometry (SoCG), 201

    Finding Complex Patterns in Trajectory Data via Geometric Set Cover

    Full text link
    Clustering trajectories is a central challenge when confronted with large amounts of movement data such as full-body motion data or GPS data. We study a clustering problem that can be stated as a geometric set cover problem: Given a polygonal curve of complexity nn, find the smallest number kk of representative trajectories of complexity at most ll such that any point on the input trajectories lies on a subtrajectory of the input that has Fr\'echet distance at most Δ\Delta to one of the representative trajectories. This problem was first studied by Akitaya et al. (2021) and Br\"uning et al. (2022). They present a bicriteria approximation algorithm that returns a set of curves of size O(kllog(kl))O(kl\log(kl)) which covers the input with a radius of 11Δ11\Delta in time O~((kl)2n+kln3)\widetilde{O}((kl)^2n + kln^3), where kk is the smallest number of curves of complexity ll needed to cover the input with a distance of Δ\Delta. The representative trajectories computed by their algorithm are always line segments. In applications however, one is usually interested in representative curves of higher complexity which consist of several edges. We present a new approach that builds upon the works of Br\"uning et al. (2022) computing a set of curves of size O(klog(n))O(k\log(n)) in time O~(l2n4+kln4)\widetilde{O}(l^2n^4 + kln^4) with the same distance guarantee of 11Δ11\Delta, where each curve may consist of curves of complexity up to the given complexity parameter ll. To validate our approach, we conduct experiments on different types of real world data: high-dimensional full-body motion data and low-dimensional GPS-tracking data
    corecore